Revealing Disease Similarities by Text Mining

نویسندگان

  • Alberto Calderone
  • Luana Licata
  • Elisa Micarelli
  • Livia Perfetto
  • Gianni Cesareni
چکیده

Texts written in human language contain structured information that is not easily parsable by computers. Text mining relies on large text corpora to derive rules which can be used by automatic means to extract automatically such information. Scientific literature represents the main source of information to study any biological phenomenon. While some phenomenon are studied to the point that corpora can actually be build, scientific literature describing rare diseases is scarce implying an even bigger challenge for automatic approaches. In order to tackle this problem the ELIXIR infrastructure is supporting various initiatives for data integration in different field of life sciences, including rare diseases, which will pave the way to the development of dedicated pieces of software. In this work we present a tool which applies a text-mining strategy to multiple text sets and merges individual results in order to infer not explicitly written connections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

An Incremental Algorithm to find Asymmetric Word Similarities for Fuzzy Text Mining

Synonymy – different words with the same meaning – is a major problem for text mining systems. We have proposed asymmetric word similarities as a possible solution to this problem, where the similarity between words is computed on the basis of the similarities between contexts in which the words appear, rather than on their syntactic identity. In this paper, we give details of an incremental al...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

The management of unstructured data is recognized as one of the major unsolved problems in the information industry and data mining paradigm. Unstructured data in computerized information that either does not have a data model and there are not easily usable by data mining. This paper proposes a solution to this problem by managing unstructured data in to structured data using legacy system and...

متن کامل

Mining at Detail Level Using Conceptual Graphs *

Text mining is defined as knowledge discovery in large text collections. It detects interesting patterns such as clusters, associations, deviations, similarities, and differences in sets of texts. Current text mining methods use simplistic representations of text contents, such as keyword vectors, which imply serious limitations on the kind and meaningfulness of possible discoveries. We show ho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017